I sometimes encounter problems like TypeError: unhashable type: 'list' while programming. Therefore, I decide to set aside some time to undertand the important Python concepts, mutable, hashable and iterable.

1. mutable and immutable

Python represents all its data as objects. Objects are identified by an unique and constant integer during its lifetime. Using the build-in function id(object) returns the identity for a given object.

Python objects can be categorized into two types: mutable and immutable. For mutable objects, its content can be altered without changing their identity. One trick to check if a type is mutable or not is to use id(object). For instance,

# immutable type, str()
>>> s = 'abc'
>>> id(s)
140125615331648
>>> s += 'def'      # a new object is created
>>> id(s)
140125614721616     # the id is changed

# mutable type, set()
>>> s = set(['a', 'b'])
>>> id(s)
140125614845184
>>> s.add('c')
>>> id(s)
140125614845184

Note that changing the content of an immutable object results in creating a new object. (A new object has to be created if a different value has to be stored)

The principal built-in types in Python are numerics, sequences, mappings, classes, instances and exceptions.

(1) immutable types

  • numbers: int(), float(), complex()
  • sequences: str(), tuple(), frozenset(), bytes()

(2) mutable types

  • sequences: list(), set(), bytearray()
  • mapping types: dict(), collections.OrderedDict([items])
  • classes, instances and exceptions

2. hashable and unhashable

The detailed description of hashable is excerpted from Python documentation: Glossary.

An object is hashable if it has a hash value which never changes during its lifetime (it needs a __hash__() method), and can be compared to other objects (it needs an __eq__() method). Hashable objects which compare equal must have the same hash value.

Hashability makes an object usable as a dictionary key and a set member, because these data structures use the hash value internally.

All of Python’s immutable built-in objects are hashable, while no mutable containers (such as lists or dictionaries) are. Objects which are instances of user-defined classes are hashable by default; they all compare unequal (except with themselves), and their hash value is derived from their id().

3. iteration, iterable and iterator

Excerpt from [2]:

Iteration is a general term for taking each item of something, one after another. Any time you use a loop, explicit or implicit, to go over a group of items, that is iteration.

An iterable is an object that has an __iter__ method which returns an iterator, or which defines a __getitem__ method that can take sequential indexes starting from zero (and raises an IndexError when the indexes are no longer valid). So an iterable is an object that you can get an iterator from.

An iterator is an object with a next (Python 2) or __next__ (Python 3) method.

Whenever you use a for loop, or map, or a list comprehension, etc. in Python, the next method is called automatically to get each item from the iterator, thus going through the process of iteration.

A good example from [2] to explain those concepts.

>>> s = 'cat'      # s is an ITERABLE
                   # s is a str object that is immutable
                   # s has no state
                   # s has a __getitem__() method 

>>> t = iter(s)    # t is an ITERATOR
                   # t has state (it starts by pointing at the "c"
                   # t has a next() method and an __iter__() method

>>> next(t)        # the next() function returns the next value and advances the state
'c'
>>> next(t)        # the next() function returns the next value and advances
'a'
>>> next(t)        # the next() function returns the next value and advances
't'
>>> next(t)        # next() raises StopIteration to signal that iteration is complete
Traceback (most recent call last):
...
StopIteration

>>> iter(t) is t   # the iterator is self-iterable

Excerpt from iterable:

An object capable of returning its members one at a time. Examples of iterables include all sequence types (such as list, str, and tuple) and some non-sequence types like dict, file objects, and objects of any classes you define with an __iter__() or __getitem__() method.

Iterables can be used in a for loop and in many other places where a sequence is needed (zip(), map(), ...). When an iterable object is passed as an argument to the built-in functioniter(), it returns an iterator for the object. This iterator is good for one pass over the set of values. When using iterables, it is usually not necessary to call iter() or deal with iterator objects yourself. The for statement does that automatically for you, creating a temporary unnamed variable to hold the iterator for the duration of the loop.

Excerpt from interator:

An object representing a stream of data. Repeated calls to the iterator’s __next__() method (or passing it to the built-in function next()) return successive items in the stream.

When no more data are available, a StopIteration exception is raised instead. At this point, the iterator object is exhausted and any further calls to its __next__() method just raise StopIteration again.

Iterators are required to have an __iter__() method that returns the iterator object itself so every iterator is also iterable and may be used in most places where other iterables are accepted.

One notable exception is code which attempts multiple iteration passes. A container object (such as a list) produces a fresh new iterator each time you pass it to the iter() function or use it in a for loop. Attempting this with an iterator will just return the same exhausted iterator object used in the previous iteration pass, making it appear like an empty container.

References:

[1] Immutable vs mutable types - Python

[2] What exactly are Python's iterator, iterable, and iteration protocols?

本文系Spark & Shine原创,转载需注明出处本文最近一次修改时间 2022-03-16 15:43

results matching ""

    No results matching ""